Rancho Cordova
Monet: Mixture of Monosemantic Experts for Transformers
Park, Jungwoo, Ahn, Young Jin, Kim, Kee-Eung, Kang, Jaewoo
Understanding the internal computations of large language models (LLMs) is crucial for aligning them with human values and preventing undesirable behaviors like toxic content generation. However, mechanistic interpretability is hindered by polysemanticity -- where individual neurons respond to multiple, unrelated concepts. While Sparse Autoencoders (SAEs) have attempted to disentangle these features through sparse dictionary learning, they have compromised LLM performance due to reliance on post-hoc reconstruction loss. To address this issue, we introduce Mixture of Monosemantic Experts for Transformers (Monet) architecture, which incorporates sparse dictionary learning directly into end-to-end Mixture-of-Experts pretraining. Our novel expert decomposition method enables scaling the expert count to 262,144 per layer while total parameters scale proportionally to the square root of the number of experts. Our analyses demonstrate mutual exclusivity of knowledge across experts and showcase the parametric knowledge encapsulated within individual experts. Moreover, Monet allows knowledge manipulation over domains, languages, and toxicity mitigation without degrading general performance. Our pursuit of transparent LLMs highlights the potential of scaling expert counts to enhance mechanistic interpretability and directly resect the internal knowledge to fundamentally adjust model behavior. The source code and pretrained checkpoints are available at https://github.com/dmis-lab/Monet.
- Europe > United Kingdom > England > Staffordshire (0.04)
- North America > United States > Florida (0.04)
- Oceania > New Zealand (0.04)
- (32 more...)
- Law (1.00)
- Banking & Finance (1.00)
- Government > Regional Government (0.68)
- Health & Medicine > Therapeutic Area > Oncology (0.45)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
Amy Schumer raises awareness about adult autism spectrum disorder through her husband's recent diagnosis
Amy Schumer believes that attractive women struggle to maintain normal lives. Jennifer Eckhart and Tom Cotter explain why they disagree. Comedian Amy Schumer revealed last week on "The Ellen DeGeneres Show" that her 42-year-old husband, chef Chris Fischer, was diagnosed with autism spectrum disorder (ASD) as an adult, which is helping raise awareness that the condition can also be diagnosed as we get older. ASD " … is a complex, lifelong developmental condition that typically appears during early childhood and can impact a person's social skills, communication, relationships, and self-regulation," according to the Autism Society. "It is defined by a certain set of behaviors and is often referred to as a'spectrum condition' that affects people differently and to varying degrees."
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > California > Sacramento County > Rancho Cordova (0.05)